The code in this replication package constructs the analysis file from the 50 data sources using R. Six main files run all of the code to generate the data for the 13 figures and tables in the paper and appendix. The replicator should expect the code to run for about 45 minutes.

The data used to conduct the analysis in this paper and appendix were obtained from NORC at the University of Chicago. They comprise 50 datasets from survey experiments conducted as part of the Time-sharing Experiments in Social Science program (tessexperiments.org). The studies we used are listed in Table D2 in the appendix. I certify that the authors of the manuscript have legitimate access to and permission to use the data used in this manuscript. All data are publicly available.

The main data source is the dataset containing all unique respondents from the 50 studies, "pooled_data.csv", provided in the Data folder.

There are also the 50 individual datasets in the data/analysis folder with names using the following structure "TESS_000_RESEARCHERLASTNAME_CLEANED". For example, the first dataset is "TESS_001_SHANNON_CLEANED.csv". 

A dataset called "heterogeneity_stats.csv" contains information from random forest models and causal random forest models run on the 50 studies.

The .csv data used to support the findings of this study have been deposited in the Political Analysis Dataverse repository (https://doi.org/10.7910/DVN/N15MSX). 


Descriptions of Code

- Code in code/data_processing/compiling_pooled_dataset.R will read and compile the 50 datasets into one containing all unique respondents; it will output the "pooled_data.csv" data file into data/analysis.
- Code in code/data_processing/prepping_50_cleaned_datasets.R will read and generate the "primary" treatment-indicator variable and outcome variable for the 50 datasets each separately; it will output the "cleaned" data files into data/analysis.
- Code in code/analysis/Table_1.R will produce 2 files: "Table_1_counts.tex" corresponds to the first row of Table 1; "Table_1_proportions" to the second row.
- Code in code/analysis/Table_2+Figure_H5.R will produce output for Rows 1 and 2 of Table 2 as "Table_1_pt1.tex". It will produce "Table_2_pt2.tex" which contains the out-of-bag error rate in Table 2. It will also produce "Table_2_pt3.tex" which contains the test-set error rate in Table 2. And it will produce "Table_2_pt4.tex" which contains the AUROC in Table 2. Lastly, it will output Appendix Figure H5.
- Code in code/analysis/Figure_2+Table_J6.R will produce Figure 2 and Appendix Table J6. 
- Code in code/analysis/Figure_3+Table_F3.R will produce Figure 3 and Appendix Table F3.
- Code in code/analysis/Figure_4+Table_I5.R will produce Figure 4 and Appendix Table I5.

- Code in code/analysis/Table_A1.R will produce Appendix Table A1.
- Code in code/analysis/Figures_G1+G2+Table_G4.R will produce Appendix Figures G1 and G2 and Appendix Table G4.
- Code in code/analysis/Figure_H3.R will produce Appendix Figure H3.
- Code in code/analysis/Figure_H4.R will produce Appendix Figure H4 in two parts, "Figure_H4_pt1.pdf" and "Figure_H4_pt2.pdf", because it is too large for one page.

Dataset List

Information for "pooled_data.csv"
"StudyId" is the study's number; it's a unique identifier used throughout this project. "WEIGHT" is NORC's survey weight. "CaseId" is NORC's study-specific case identifier. "AGE" is age in years. "GENDER" is respondent gender. "RACETHNICITY" is respondent race and ethnicity. "EDUC" is respondent education level. "MARITAL" is marital status. "EMPLOY" is employment status. "INCOME" is income level. "STATE" is state of residence. "METRO" is a binary indicatory for residence in a metropolitan area. "INTERNET" indicates respondent internet access. "PartyID7" is 7-point party identification. "ATTEND" is frequency of religious attendance. "HOUSING" is whether respondent owns or rents their home. "HOME_TYPE" is the type of dwelling respondent lives in. "PHONESERVICE" is the type of phone respondent uses, whether mobile or landline. "HHSIZE" is the household size. "Y1" is the standardized dependent variable from the study's "primary" analysis. "D1" is the binary treatment-status indicator for that analysis. "TNRFU" is the non-response follow-up indicator. "study_year" is the year the study took place. "birth_year" is respondent's year of birth. 

Information for "heterogeneity_stats.csv"
The "Study" variable says which study the row contains data on. "ATE.eager" is the average treatment effect among eager respondents. "ATE.reluctant" is the average treatment effect among reluctant respondents. "r.tau.NRFUhat" is the estimated correlation between the individual-level treatment effects and the predicted NRFU probabilities. "r.tau.NRFUhat.LB" is the lower bound of the 95% confidence interval of that estimate; "r.tau.NRFUhat.UB" is the upper bound. "d.forest.pred.p" is the p-value associated with the differential forest prediction from the causal random forest model. "mean.forest.pred" is the mean forest prediction from that model.

Information for raw datasets in data/raw
These are 50 TESS survey experiment data files given us by NORC at the University of Chicago. They are identical to the publicly available files for the studies but with one additional variable: "TNRFU", which indicates whether the respondent was recruited via non-response follow-up (NRFU). 

Information for raw datasets in data/extra
This is 1 additional dataset for one study (no. 9). Because of its experimental design, it came with 2 files that need to be merged.

Instructions to Replicators

Use the "master.R" file to install the necessary packages. 
Running the code in the "master.R" file will output the data files, figures, and tables. 
If you want to create figures and/or tables from their respective R scripts, you can do so, but you must ensure the working directory is set to the folder containing "master.R". 

Software Requirements

R 4.3.2
tidyverse 2.0.0
xtable 1.8-4
broom 1.0.5
estimatr 1.0.2
deming 1.4
rsample 1.2.0
ggpubr 0.6.0
randomForest 4.7-1.1
grf 2.3.2
gt 0.10.1
gtsummary 1.7.2
radiant 1.6.1
rio 1.0.1
datawizard 0.9.1
Hmisc 5.1-1

The code was last run on a 19-core M2 Pro GPU Apple laptop with 16 GB RAM running MacOS version 14.2.1.